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(54) Title: NUCLEIC ACID BINDING PROTEINS 
(57) Abstract 



The invention provides a method 
for preparing a nucleic acid binding 
protein of the Cys2-His2 zinc finger 
class capable of binding to a nucleic 
acid quadruplet in a target nucleic acid 
sc<]ucnce, wherein binding to base 4 
of the quadruplet by an a-helical zinc 
finger nucleic acid binding motif in 
the protein is determined as follows: 
if base 4 in the quadruplet is A, then 
position +6 in the a-hciix is Gin and 
position -M-2 is not Asp; and if base 4 
in the quadruplet is C, then position +6 
in the a-hclix may be any residue, as 
long as position -h-2 in the a-helix is 
not Asp. 
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^'llVlnr And ^'^P^'^^? Proteins 

The present invention relates to nucleic acid binding proteins. In particular, the invention 
relates to a method for designing a protein which is capable of binding to any predefined 
nucleic acid sequence. 

Protein-nucleic acid recognition is a commonplace phenomenon which is central to a large 
number of biomolecular control mechanisms which regulate the functioning of eukaryotic 
and prokaryotic cells. For mstance, protein-DNA interactions form the basis of the 
resulation of gene expression and are thus one of the subjects most widely studied by 
moiecuiar biologists. 

A wealth of biochemical and strucmral information explains the details of protem-DNA 
recognition in numerous instances, to the extent that general principles of recognition have 
emerged. Many DNA-binding proteins contain independently folded domains for the 
recognition of DNA. and these domains in cum belong to a large number of strucmral 
families, such as the leucine zipper, the "helix-airn-helix" and zinc finger families. 

Despite the great variety of stmccural domains, the specificity of the in:eractions obser\'ed 
to date between protein and DNA most often derives from the complementarity of the 
surfaces of a protein a-heli\ and the major groove of DNA [Klug, (1993) Gene 155:83-92]. 
In light of the recurring physical interaction of a-helix and major groove, the tantalising 
possibility arises that the contacts between panicular amino acids and DNA bases could be 
described by a simple set of rules; in effect a stereochemical recognition code which relates 
protein primary- structure to binding-site sequence preference. 

Tt 15 clear however that no code w ili be found which can describe DNA recognition by all 

ramilies inizizc: wuh the major groove of DNA. thus precluding similarities m patterns of 
recognition The ma:ori:v of kTiOwn DNA-binding motifs are not particularly versatile, and 
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elucidated for the interactions of classical zinc fingers with nucleic acid. In this case a 
pattern of rules is provided which covers binding to all nucleic acid sequences. 

According to a first aspect of the present invention, therefore, we provide a method for 
5 preparing a nucleic acid binding protein of the Cys2-His2 zinc finger class capable of 
binding to a nucleic acid quadruplet in a target nucleic acid sequence, wherein binding to 
base 4 of the quadruplet by an a-helical zinc fmger nucleic acid binding motif in the protein 
is determined as follows; 

10 v.]> if base 4 in :he quadruplet is A. then position +6 in the a-helix is Gin and -f — 2 is not 

Asp; 

b) if base 4 in the quadruplet is C, then position +6 in the a-heii.x may be any residue, as 
long as position -^4-2 in the a-helix is not Asp. 

15 . Preferably, binding to base 4 of the quadruplet by an a-helical zinc fmger nucleic acid 
binding motif in the protein is additionally determined as follows: 

c) if base 4 in the quadruplet is G, then position +6 in the a-helix is Arg; or position -h6 
is Ser or Thr and position -f -f 2 is Asp; 

20 d) if base 4 in the quadruplet is T, then position +6 in the a-helix is Ser or Thr and 

position ^ -2 is Asp. 

The quadruplets specified in the present invention are overlapping, such that, when read 3' 
to 5' on the -strand of the nucleic acid, base 4 of the first quadruplet is base 1 of the 
25 second, and so on. Accordingly, in the present application, the bases of each quadruplet 
are referred by number, from 1 to 4, 1 being the 3' base and 4 being the 5' base. 

30 refers to the residue in Lhe framework strucmre immediately preceding the a-helix in 

n Cvs2-His2 zir.c finder DolvoeDtide. 
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acid, since it is this strand which is aligned 3* to 5'. These conventions are followed in the 
nomenclature used herein. It should be noted, however, that in nature certain fingers, such 
as finser 4 of the protein GLI, bind to the -f strand of nucleic acid: see Suzuki e: al., 
(1994) NAR 22:3397-3405 and Pavletich and Pabo, (1993) Science 251:1701-1707. The 
incorporation of such fingers into nucleic acid binding molecules according to the invention 
is envisaged. 

Tu^ in^^pnrin^t nrovifies H .solution to a problem hitherto unaddressed in the art, by 
permitting the rational design of polypeptides which will bind nucleic acid quadruplets 
whose 5' residue is other than G. In particular, the invention provides for the firsi time a 
solution for the design of polypeptides for binding quadruplets containing 5' A or C. 



Position -6 in the a-helix is generally responsible for the interaction with the base 4 of a 
eiven quadruplet in the target. According to the present invention, an A at base 4 interacts 
15 with a Glutamine (Gin or Q) at position +6, while a C at base 4 will interact with any 
amino acid provided that position + +2 is not Aspartic acid (Asp or D). 

The present invention concerns a method for preparing nucleic acid binding proteins which 
are capable of binding nucleic acid. Thus, whilst the solutions provided by the invention 

2G will resuii in a functional nucleic acid binding molecule, ic is possible that naoirally- 
occurrmg zinc finger nucleic acid binding molecules may not follow some or all of the 
rules provided herem. This does not matter, because the aim of the mvention is to permit 
the design of the nucleic acid bindmg molecules on the basis of nucleic acid sequence, and 
not the converse. This is why the rules, in certain instances, provide for a number of 

25 possibilities for any given residue. In other instances, alternative residues to those given 
may be possible. The present invention, thus, does not seek to provide every solution for 

; .'~ ^ f-.- o,;,-or- fn---"'' r^n^'^i^ ; Tr Hr\f>c h'^u'f* vf*r "provide 



30 
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position +2 in the helix is responsbile for determinine the binding to base 1 of the 
quadruplet. In doing so, it cooperates synergisiically with position +6, which determines 
binding at base 4 in the quadruplet, bases 1 and 4 being overlapping in adjacent 
quadruplets. 

A zinc fmeer binding motif is a strucmre well known to those in the art and defmed in, for 
example. Miller et cil., (1985) EMBO J. 4:1609-1614; Berg 11938) PNAS (USA) 85:99- 
im- T Pt ni Srience 245:635-637: see International patent applications WO 

96/06166 and WO 96/32475. corresponding to USSN 08M22.107. incorporated herein by 
10 reference. 

As used herein, "nucleic acid" refers to both UNA and DNA. constructed from natural 
nucleic acid bases or synthetic bases, or mixtures thereof. Preferably, however, the 
binding proteins of the invention are DNA binding proteins. 

15 

In eenerai, a preferred zinc finger framework has the strucrare: 

(Aj Xo.2 C ^<:-5 ^ ^^?-i4 H "/c 

20 where X is any ammo acid, and the numbers in subscript indicate the possible numbers of 
residues represented bv X. 

in a preferred aspect of the present invention, zinc finger nucleic acid binding motifs may 
be represented as motifs having the following primary structure: 

25 

(B) C X,., C X..3 F X'' X X X X L X X H A X X^ H - linker 



30 or" 2 or or 2 or 3. ammo acids, respectively. The Cys ana His residues, which together 
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Preferably, the linker is T-G-E-K or T-G-E-K-P. 
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As set out above, the major binding interactions occur with amino acids -l.-r2,-H3 and 
+6. Amino acids +4 and +7 are largely invariant. The remaining amino acids may be 
5 essentially any amino acids. Preferably, position -h9 is occupied by Arg or Lys. 
Advantageously, positions -hi, +5 and +8 are not hydrophobic amino acids, that is to say 
are not Phe, Trp or Tyr. 

In a most preferred aspect, therefore, bringing together the above, the invention allows the 
10 definition of every residue in a zinc finger nucleic acid binding motif which will bind 
specifically ro a given nucleic acid quadruplet. 

The code provided by the present invention is not entirely rigid; certain choices are 
provided. For example, positions +1, -r5 and -r8 may have any ammo acid allocation, 
i5 whilst other positions may have certain options: for example, the present rules provide 
that, for binding to a central T residue, any one of Ala, Ser or Val may be used at -r3. In 
its broadest sense, therefore, the present invention provides a very large number of proteins 
which are capable of binding to ever)* defined target nucleic acid quadruplet. 

20 Preferably, however, the number of possibilities may be significanciy reduced. For 
example, the non-criiical residues -f-1, -5 and +8 may be occupied by the residues Lys, 
Thr and Gin respectively as a default option. In the case of the other choices, for example, 
the first-given option may be employed as a default. Thus, the code according to the 
present invention allows the design of a single, defined polypeptide (a "default" 

25 polypeptide) which will bind to its target quadruplet. 

In a further aspect of the nresent invention, there provided a method for oreuarinc: a 

30 
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residues known to affect binding to bases at which the natural and desired targets differ. 
Otherwise, mutation of the model fingers should be concentrated upon residues -1. -r2,+3 
and -6 as provided for in the foregoing rules. 

In order to produce a binding protein having improved binding, moreover, the rules 
provided by the present mvention may be supplemented by physical or virmal modelling of 
the protein/nucleic acid interface in order to assist in residue selection. 

Zinc finaer binding motifs designed according to the invention may be combined into 
nucleic acid binding proteins having a multiplicity of zinc fingers. Preferably, the proteins 
have at least two zinc fingers. In namre, zinc finger binding proteins commonly have at 
least three zinc fingers, although two-zinc finger proteins such as Tramtrack are known. 
The presence of at least three zinc fingers is preferred. Binding proteins may be 
constructed by joining the required fingers end to end, N-terminus to C-terminus. 
Preferably, this is effected by joining together the relevant nucleic acid coding sequences 
encoding the zinc fingers to produce a composite coding sequence encoding the entire 
binding protein. The invention therefore provides a method for producing a nucleic acid 
binding protein as defined above, wherein the nucleic acid binding protein is constructed by 
recombinant DNA technology, the method comprising the steps of: 

a) preparing a nucleic acid codmg sequence encoding two or more zmc finger binding 
motifs as defined above, placed N-termmus to C-termmus; 

b) inserting the nucleic acid sequence into a suitable expression vector; and 

c) expressing the nucleic acid sequence in a host organism in order to obtain the nucleic 
acid binding protein. 



The nucleic acid encoding the nucleic acid binding protein according to the invention can 

As used herein, vector Cor olasmid") 
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can be amplified by PCR and be directly iransfected into the host cells without any 
replication component. 

Advantageously, an expression and cloning vector may contain a selection gene also 
referred to as selectable marker. This gene encodes a protein necessary for the survival or 
arowth of transformed host cells grown in a selective culture medium. Host cells not 

transformed with the vector containing the selection gene will not survive in the culture 
medium Typical selection genes encode proteins that confer resistance to antibiotics and 
other toxins, e.g. ampicillm, neomycin, methotrexate or tetracycline, complement 
auxotrophic deficiencies, or supply critical nutrients not available from complex media. 

As to a selective gene marker appropriate for yeast, any marker gene can be used which 
facilitates the selection for transformams due to the phenotypic expression of the marker 
gene. Suitable markers for yeast are, for example, those conferring resistance to antibiotics 
G41S hveromvcm or bleomycin, or provide for prototrophy in an auxotrophic yeast 
mutant, for example the URA3, LEU2, LYS2, TRPl, or HIS3 gene. 

Since the replication of vectors is conveniently done in £. coli, an coli genetic marker 
and an E. coii origin of replication are advantageously included. These can be obtained 
from E. CO// plasmids. such as pBR322, Bluescripf^ vector or a pUC plasmid. e.g. pUCIS 
or pUCI9, which contain both £. coli replication origin and £. coii genetic marker 
conferring resistance to antibiotics, such as ampiciilin. 

Suitable selectable markers for mammalian cells are those that enable the identification of 
cells competent to take up nucleic acid binding protein nucleic acid, such as dihydrofolaie 
reductase (DHFR. methotrexate resistance), thymidine kinase, or genes conferring 

^ ^ ^vr'r-->r>^T,'^'^ ^^^^^ji^n ocij rrnn'^fonT!3nt^ niaced under 



GS) marker, selection pressure can be imposed by culcurmg the transformants under 

h;-h rhe pressure prngressivelv increased, thereby leading to amplificatio 
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UV5 promoter. This system has been employed successfully tor over-producuon of many 
proteins. Alternativeiy the polymerase gene may be introduced on a lambda phage by 
infection with an int- phage such as the CE6 phage which is commercially available 
(Novasen, Madison, USA), other vectors include vectors containing the lambda PL 
5 oromoter such as PLEX (Invitrogen, NL) , vectors comaming the trc promoters such as 
pTrcHisXpressTm (Invitrogen) or pTrc99 (Pharmacia Biotech. SE) or vectors containing 
the tac promoter such as pKK223-3 (Pharmacia Biotech) or PMAL (New England Biolabs, 
M.A. USA) 

10 Moreover, the nucleic ?x\d binding protein gene according to the invention preferably 
includes a secretion seauence in order to facilitate secre::on or the polypeptide from 
bacterial hosts, such that it will be produced as a soluble native peptide raiher than in an 
inclusion body. The peptide may be recovered from^ the bacterial periplasmic space, or the 
culture medium, as appropriate. 

15 

Suitable promoting sequences for use with yeast hosts may be regulated or constitutive and 
are preferably derived from a highly expressed yeast gene, especially a Saccharomyces 
cerevisiae gene. Thus, the promoter of the TRPl gene, the ADHI or ADHII gene, the acid 
phosphatase iPH05 ) gene, a promoter of the yeast mating pheromone genes coding for the 

:0 a- or a-factor or a promoter derived from a gene encoding a glycolytic enzyme such as the 
promoter of the enolase. glyceraldehyde-3-phosphate dehydrogenase (GAP), 3-phospho 
glycerate kinase (PGK), hexokinase, pyruvate decarboxylase, phosphofractokinase, 
glucose-6-phosphate isomerase, 3-phosphoglycerate mutase, pyruvate kinase, triose 
phosphate isomerase. phosphoglucose isomerase or glucokinase genes, or a promoter from 

25 the TATA binding protein (TBP) gene can be used. Furthermore, it is possible to use 

hybrid promoters comprising upstream activation sequences (UAS) of one yeast gene and 

r.io->^^-rr irr-iiiri-n^ n ^ i - " i o 1 T.\TA hox o f iHO t h s T vcas t 2ene. for 

30 h\brid promoter). A suitable ccnstirutive PH05 promoter is e.g. a shortened acid 

^•^.^^^p.^,,,,-^ pun- -r^r.^re- ce\'rMd of the crstream regulator.' elements fUAS") such as the 
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contain nucleotide segments transcribed as poiyadenylated fragments in the untranslated 
portion of the mRNA encoding nucleic acid binding protein. 

An expression vector includes any vector capable of expressing nucleic acid binding protein 
nucleic acids that are operatively linked with regulatory sequences, such as promoter 
regions, that are capable of expression of such DNAs, Thus, an expression vector refers to 
a recombinant DNA or RNA construct, such as a plasmid, a phage, recombinant virjs or 
..^ .1.-. -.^.^^^.^r-rir^n inrn qnnrnnHarp host ccU. rcsults in expression of 

Uinei VCLLUl , uiau u|;wii ilil.L^^^a,aw^.*w,* — rr--r 

ihe cloned DNA. Appropriate expression vectors are well known to those with ordinary 
skill in the art and include those that are replicable in eukaryntic and/or prokaryotic cells 
and those that remain episomai or those which integrate into the host cell genome. For 
example. DNAs encoding nucleic acid bindmg protein may be inserted into a vector 
suitable for expression of cDNAs in mammalian cells, e.g. a CMV enhancer-based vector 
such as pEVRF (Matthias, et al., (1989) NAR 17, 6418). 

Particularly useful for practising the present invention are expression vectors that provide 
for the transient expression of DNA encoding nucleic acid binding protem in mammalian 
cells Transient expression usually involves the use of an expression vector that is able to 
replicate efficiently in a host cell, such that the host cell accumulates many copies of the 
expression vector, and. in mrn. synthesises high levels of nucleic acid binding protein. For 
the purposes of :he present invention, transient expression systems are useful e.g. for 
identifying nucleic acid binding protein mutants, to identify potential phosphorylation sues, 
or to characterise functional domains of the protein. 

Construction of vectors according to the invention employs conventional ligation 
techniques. Isolated plasmids or DNA fragments are cleaved, tailored, and religated in the 



r *■ \ 



hos: cells, and performing analyses for assessing nucleic acid binding protein expression 

; r.. :• > ; 'h-c^ -p nr' GcHe rre^eHCf . amplification and/or 
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binding protein may be empirically determined and optimised for a particular cell and 
assay. 

Host cells are transfected or, preferably, transformed with the above-captioned expression 
or cloning vectors of this invention and cultured in conventional nutrient media modified as 
appropriate for inducing promoters, selecting transformants. or amplifying the genes 
encoding the desired sequences. Heterologous DNA may be introduced into host cells by 
any method known in the art, such as transfeciion with a vector encoding a heterologous 
DNA by the calcium phosphate coprecipitation technique or by electroporaiion. Numerous 
methods of iransfection are known to the skilled worker in the field. Successful transfection 
is sienerallv recognised when any indication of the operation of this vector occurs in the 
host cell. Transformation is achieved using standard techniques appropriate to the particular 
host cells used. 

Incorporation of cloned DNA into a suitable expression vector, transfection of eukaryotic 
cells with a plasmid vector or a combination of plasmid vectors, each encoding one or more 
distinct senes or with linear DNA, and selection of transfected cells are well known in the 

an (see, e.g. Sambrook at al. (1989) Molecular Cloning: A Laboratory Manual, Second 
Edition, Cold Spring Harbor Laboracory Press). 

Transtected or iranstormed cells are cultured using media and cultunng methods known m 
the an, preferably under conditions, whereby the nucleic acid binding protein encoded by 
the DNA IS expressed. The composition of suitable media is known to those in the art, so 
that they can be readily prepared. Suitable cultunng media are also commercially available. 

In a funher aspect, the invention also provides means by which the binding of the protein 
desiened according to the rules can be improved by randomising the proteins and selecting 



m:: inven:ion mav be subjecied to limited randomisation and subsequent selection, such as 
bv ohase disolav, m order lo optimise the binding characteristics ot the molecule. 



wo 98/53060 PCT/GB98/01516 

21 

affinity purificaiion. The phage are then amplified by passage through a bacterial host, and 
subjected to further rounds of selection and amplification in order to enrich the mutam pool 
for the desired phage and eventually isolate the preferred clone(s). Detailed methodology 
for phage display is known in the art and set forth, for example, in US Patent 5.223.409; 
5 Choo and Klug, (1995) Current Opinions in Biotechnology 6:431-436; Smith. (1985) 
Science 228:1315-1317; and McCaffeny et ai, (1990) Nature 348:552-554; ail 
mcorporated herein by reference. Vector systems and kits for phage display are available 
- - w f.-^r- (»vornnlp frfvm Phnrmncia. 

10 Randomisanon of the zinc finger binding motifs produced according tc the invention is 
nreferablv directed to those residues where the code provided herein gives a choice of 
residues. For example, therefore, positions ^1, -^5 and -l-S are advantageously 
randomised, whilst preferably avoiding hydrophobic amino acids; positions involved in 
bindinc to the nucleic acid, notably -1, +2, -^3 and +6, may be randomised also, 

15 preferably wimin the choices provided by the rules of the present invention. 

Preferably, therefore, the "default" protein produced according to the rules provided by the 
invention can be improved by subjecting the protein to one or more rounds of 
randomisation and selection within the specified parameters. 

20 

nucieic acid bmding proteins according to the invention may be employea in a wide variety 
of applications, including diagnostics and as research tools. Advantageously, they may be 
employed as diagnostic tools for identifying the presence of nucleic acid molecules in a 
complex m.ixmre. nucleic acid bindmg molecules according to the invention can 
25 differentiate single base pair changes in target nucleic acid molecules. 



) preparing a nucieic acic binding protein by the method se: rorth above which is speciiic 



wo 98/53060 PCT/GB98/01516 

23 

nucleic acid cleaving domain is fused to a nucleic ac.d binding domain comprising a zmc 
finaer as described herein. 

The invention is described below, for the purpose of illusiration only, in the following 
examples, with reference to the figures, in which: 

Figure 1 illustrates the design of a zinc finger binding protein specific for a G12V mutant 
ras oncogene; 

Figure 2 illustrates the binding specificity of the binding protein for the oncogene as 
opposed to the wild-type ras sequence; and 

Figure 3 illustrates the results of an ELISA assay performed using the anti-ras binding 
protein with both wild-type and mutant target nucleic acid sequences; 

Figure 4 illustrates interactions between the Zif268 DNA-bindmg domain and DNA. (a) 
Schematic diagram of modular recognition between the three zinc fingers of Zif268 and 
triplet subsites of an optimised DNA binding site. Straight arrows indicate the 
stereochemical juxiaposuomng of recognition residues with bases of the contacted G-rich 
DNA strand. Note that since the N-terminal linger contacts the 3' end of the DNA and the 
C-termmal finger the 5' end, binding to the G-rich strand is said to be antiparallel, (b) 
View of Zif26S finger 3 bound to DNA. showing the possibility of interaction with both 
DNA strands. Co-ordinates from Pavletich & Pabo, (1991) Science 252:809-817. (c) The 
potential hydrogen bonding network between bases on both strands of the DNA and 
nnsitioas 1 (.\vo) and 2 fAsp) of finger 3 (Pavletich & Pabo 1991). (d) Schematic diagram 



the parallel DNA strand (shown by curly arrows) mean that each finger binds overlapping. 



5 
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Example 1 

Consiriicnon of a zinc finger protein 

The tareet selected for the zinc fmger nucleic acid binding protein is the activatmg pomt 
mutation of the human EJ bladder carcinoma ras oncogene, which was the first DNA lesion 
reported to confer transforming properties on a cellular proto-oncogene. Since the original 
discovery, ras gene mutations have been found to occur at high frequencies in a variety of 
h,.^,n rnnr^r. nnri nre established tarsets for the diagnosis of oncogenesis at early stages of 
tumour growth. 



The EJ bladder carcinoma mutation is a single nucleotide change in codon 12 of H-ra^. 
which results in a mutation from GGC to GTC at this position. A zinc finger peptide is 
designed to bind a lObp DNA site assigned in the noncoding strand of the mutant ras gene, 
such that three fingers contact 'anticodons' 10, 11 and 12 in series, as shown in Fig. 1, 
15 plus the 5- preceding G (on the +strand of the DNA). The rationale of this assignment 
takes into accoum the fact that zinc fingers make most contacts to one DNA strand, and the 
mutant noncoding strand carries an adenine which can be strongly discriminated from the 
cytosine presem in the wild-type ras, by a bidentate contact from an asparagine residue. 

20 The first finger of ihe designer lead peptide is designed according to the rules set forth 
herem starting from a Zif268 finger 2 model to bind the quadruplet 5'-GCCG-3\ which 
corresponds to 'anticodon' 10 of the designated binding sue plus one 3' base. The finger 
has the following sequence: 

25 FQCRICMRNFSDRSSLTRHTRTHTGEKP 

-1123 4=6789 



30 
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According ro the recognition rules, the first finger of the lead peptide could contact 
cvtosme usine one of Asp, Glu. Ser or Thr in the third a-helix position. To determme the 
optimal contact, the codon for helical position 3 of finger 1 is engineered by cassette 
mutagenesis to have position 1= A/G, position 2 = A/C/G and position 3 =C/G. Therefore 
in addition to Asp, Glu. Ser and Thr. the randomisation also specifies Ala. Arg. Asn. Gly 
and Lys. Selections from this mmi-library are over one round of phage binding to 5nM 
mutant DNA ol.go m lOO iil PBS coruaiumg 50^'.M ZnCb. 2fc (-/v) fat-free dried milk 
(Marvel) and 1% (v/v) Tween-20, with lug poly dIdC as competitor, followed by six 
washes wiih PBS coniaining iOaM ZnCb and 1 (v/v) Tween-20. Bound phage are eluted 
with O.IM tnethylamme for 3 mms. and immediaiely transferred to an equal volume of IM 
Tris-Cl pH 7.4. 

A single round of randomisation and selection is found to be sufficient to improve the 
affimiy of the lead zinc finger peptide to this standard. A small library of mutants is 
constructed with limited variations specifically in the third a-helical position (-3) cf finger 
1 of the designed peptide. Selection from this library yields an optimised DNA-binding 
domain with asparagme at the variable position, which is able to bind the mutant res 
sequence with an apparent Kd of 3nM. i.e. equal to that of the wild-type Zif268 DNA- 
DUiQuvi domain (Fig. 2). The selection of asparagine at this position to bind opposite a 
cyiosine is an unexpected deviation from the recognition mles. which normally pair 
asparagine with adenine. 

The selection of asparagine is, however, consistent with physical considerations of the 
protein-DNA interface. In addition to the classical bidentate interaction of asparagine and 
adenine observed in zinc finger-DNA complexes, asparagine has been observed to bridge a 



nerecchcmxai pa;nngs ol hvclrogt;:i bond donors ana accepLors wracii .oJi^^ 
as-araome. m.ciudm- me underlined step GQ£ of ras 'anticodon' 10. Although asparagme 
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removed by washing the beads 6 times with PBS containmg 50^M ZnCh and ITc (v/v) 
Tween-20. The beads are subsequently incubated for Ih at RT with ami-M13 IgG 
conjusated to horseradish peroxidase (Pharmacia Biotech) diluted 1;5000 in PBS contaimng 
50^M ZnCi2 and 2% (w/v) fat-free dried milk (Marvel). Excess antibody is removed by 

5 washins 6 times with PBS containing 50uM ZnCb and 0.05% (v/v) Tween, and 5 times 
with PBS containing 50uM ZnCh. The ELISA is developed with O.lmg/ml 
tetramethylbenzidine (Sigma) in O.IM sodium acetate pH5.4 containing 2li1 of fresh 30% 
hvdrosen peroxide per 10ml buffer, and atter approximately i lum, stopped with an equal 
volume of 2M H2SO4, The reaction produces a yellow colour which is quantitated by 

10 subcracung tne aosorbance ar 650nm trom ihe absorbance a: 450rjn. It should be noted that 
m this protocol the ELISA is not made competitive, however, soluble (non biotinylated) 
wild-type ras DNA could be mcluded in the binding reactions, possibly leading to higher 
discrimination between wild-type and mutant ras, 

15 Phase are retained specifically by DNA bearing the mutant, but not the wild-type ras 
sequence, allowing the detection of the point mutation by ELISA (Fig. 3). 
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Example 4 

Design of an anii-HIV zinc finger 

The sequence of the HIV TAR, the region ot the LTR which is responsible for trans- 
activation by Tai. is known (Jones and Peterlin, (1994) Ann. Rev. Biochem. 63:7n-743). 
A sequence with the TAT region is identified and a zmc fmger polypeptide designed to 
bind thereto. 

The selected seauence is 5^ - AGA GAG CTC - 3\ which is the complement of nucleotides 
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vector. Electrocompetent TGI cells are transformed with the recombinant vector. Single 
colonies of tranformants are grown overnight in 2.xTY containing 50uM ZnCh. 15ug/ml 
tetracycline. Single stranded DNA Is prepared from phage in the culture supernatant and 
sequenced with Sequenase 2.0 (United States Biochemical). 

The polypeptide designed according to the invention is then tested for binding to HIV DNA 
and positive results are obtained. 



Example 5 

Alanine mutagenesis or the Asp2 in finger 3 is carried out on the wild-type Zif26S DNA- 
bindine domain and four related peptides isolated from the phage display library as follows 
(see also Fig. 5): 

E. coil TGI cells are tranfected with fd phage displaymg zinc fingers. Colony PGR is 
performed with one primer containing a single mismatch to create the Asp to Ala change m 
finser 3. Cloning of PCR product in phage vector is as described previously (Choo, Y. & 
Klug, A. (1994) Proc. Natl. Acad. Sci. USA 91, 11163-11167; Choo. Y. & Klug, A. 
(1994) Proc. Natl. Acad. Sci. USA 91, 11168-11172). Briefly, forward and backward 
PCR i^nmers contained unique restriction sites for Not I or Sfi I respectively and amplified 
an appro.ximately 300 base pair region encompassmg three zmc fingers. PCR products are 
digested with Sfi. I and Not I to create cohesive ends and are ligated to lOOng of similarly 
digested fd-Tet-SN vector. Electrocompecent TGI cells are transformed with the 
recombinant vector. Single colonies of tranformants are grown overnight m 2xTY 
containing 50uM ZnCb 15^g/ml tetracycline. Single stranded DNA is prepared from 
phage m the culture supernatant and sequenced with Sequenase 2.0 (United States 



position 0 or" :he middle finger. Peptiae F2-Arg, which coniams Are at position b ot tmger 

^-^nld snec:f\' ^'-G in the 'middle' cognate triplet regardless of the 
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2, As would be expected, according to the hypothesis sei out in the miroGucticn. the 
mutation affects binding at the 5' position, while the specificity at the middle and 3' 
position remains unchanged. 

The mutation generally leads to a broadening of specificity, for instance in Zif268 where 
removal of Asp2 in finger 5 results in a protein which is unable to discriminate the 5' base 
of the middle triplet (Fig. 6a). However, the expectation that a new 5' base-specincity for 
the mutants might correlate to the identity of position 6 in nnger 2, is not borne out. For 
examole F2-Glv would be expected to lose sequence discrimination but, although specificity 
:s adversely affected, a slight preference for T is discernible (Fig. 6b). Similarly. F2-Val 
and F2-Asn which might have been expected to acquire specificity for one nucleotide, 
instead have their soecirlcitics altered bv the mutation (Fig. 6c, d) - the F2-Val mutant 
allows G, A and T but not C, and the F2-Asn mutant appears to discriminate against both 
pyrimidines. In the absence of a larger database it is not possible to deduce whether these 
apparent specificities are the result of amino acid-base contacts from position 6 of finger 2, 
and if so whe:her these are general interactions which should be regarded as recognition 
rules. The apparent discrimination of F2-Gly in panicular. suggests that this is unlikely to 
be the case, but rather that in these particular examples, other mechanisms are involved in 
determining sequence bias. 

In contrast to the loss of discrimination seen for the other four peptides. F2-Arg wontinues 
to specify guamne in the 5' position of the middle triplet regardless of the mutation in 
finger 3 (fig 3e). In this case, the specificity is derived from the strong interaction between 
guanine and Arg6 in finger 2. This contact has been observed a number of times in zinc 
finger co-crystal strucaires (Pavletich, N. P. & Pabo, C. O. (1993) Science 261. 1701- 
1707; Fairall. L., Schwabe, J. W. R., Chapman, L., Finch, J. T. &i Rhodes, D. (1993) 
v„^,..wT -4;^ 4^"-4R7- Fair^n ^ Schwabe J W R Chapman. L. . Finch, J. T. 

identity ai pcsuion 6 to a nucleotide preference at the 5' position of a cognate triple: (Choo, 
^' K'l:^ A (]OQi^ Curr Ooin Str Biol. 7, 117-125). This in:eraction is compatible 
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To determine the contribution of Asp2 in finger 3 to the binding strength, apparent 
equilibrium dissociation constants are determined for Zif268 and F2-Arg before and after 
the Ala mutation (Fig. 7). Procedures are as described previously (Choo and Klug. 1994). 
Briefly, appropriate concentrations of 5'-biotinylated DNA binding sites are added to equal 
volumes of phage solution described above. Binding is allowed to proceed for one hour at 
20°C. DNA is captured with streptavidin-coated paramagnetic beads (500ng/\vell). The 
beads are washed 6 times with PBS/Zn containing 1 % Tween. then 3 times with PBS/Zn. 
Bound phage are detected by ELISA with horseradish peroxidase-conjugated anti-M13 IgG 
(Pharmacia Biotech) and quantitated using SOFTMAX 2.32 (Molecular Devices). Binding 
J.ita are plotted and analysed using Kaleidagraph (Abelbeck Software). 



Both mutants show approximately a four-fold reduction in affmity for their respective 
bindms sices under the conditions used. The reduction is likely a direct result of abolishing 
contacts from Asp2. rather than a consequence of changes in binding specificity at the 5' 
15 position of the middle triplet, since the mutant Zif26S loses all specificity while F2-Arg 
resisters no change in specificity. However, note that two stabilising interactions are 
abolished: an intramolecular buttressing interaction with .Arg-1 on finger 3 and also the 
intermolecular contact with the secondary DNA strand. .\n independent comparison of 
wild-type Zii^SS binding to its consensus binding site flanked by G/T or A/C also found a 
20 five-fold reduction in affinity for those sues which are unable to satisfy a contact from 
.\sp2 to the secondary DNA strand (Smirnoff. A. H. ^ .Milbrandt, J. (1995) Mol. Cel. 
Biol. 15, 22~5-2287). While the effects of perturbations in the DNA strucmre cannot be 
discounted in this case, the results of both experiments would seem to suggest that the 
reduction in binding affinity results from loss of the protein-DNA contact. Nevertheless, 
25 the intramolecular contact between positions -1 and 2 in a zinc finger, is a further level of 
synergy which may have to be taken into account before the full picaire emerges, 

^-ccihif- networks nf rnntac:.'^ which occur at the protein-DNA interface in 
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iv) ANTI -SENSE: NC 



(ix) FEATURE: 

(A) NAME /KEY: CDS 

(B) -.OCATION:!. .264 



(XI) SEQUEN'CE DESCRIPTION: SEU NO: 1 



GCA G.'-^ GA3 lAG CCT TTT CAG TGT CCA ATC TGC ATG CGT AAC TTC AGC 
:.ia Glu Glu Lvs Pro ?he Gin Cys Arg lie Cys Met Arg Asn Phe Ser 

GAT CGT AGT AGT CTT ACC CGC CAC ACG AGG ACC CAC ACA GGC GAG .^-^-G 
Asp Arg Ser Ser Leu Thr Arg Kis Thr Arg Thr His Thr Gly Glu Lys 



20 



25 30 



CCT TTT CAG TGT CGA ATC TGC ATG CGT AAC TTC AGC AGG AGC GAT AAC 
Pro Phe Gin Cys Arg He Cys Met Arg Asn Phe Ser Arg Ser Asp Asn 
35 40 

CTT ACG AGA CAC CTA AGG ACC CAC ACA GGC GAG ;^_AG CCT TTT CAG TGT 
---^ z.^- ---s Le- ;^ra Thr His Thr Gly Glu Lys Pro Fhe Gin Cys 
50 55 60 

CGA ATC TGC ATG CGT AAC TTC AGG CAA GCT GAT CAT CTT CAA GAG CAC 

Arg He Cys Met Arg Asn Phe Arg Gin Ala As? His Leu Gin Glu His 

-7n 75 60 

65 ^0 

/--i. AAG ACC CAC ACA GGC GAG AAG 



48 



96 



144 



192 



240 



264 
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Claims 



1. A method for prepanng a nucleic acid binding protein of the Cys2-His2 zinc finger 
class capable of binding to a nucleic acid quadruplet in a target nucleic acid sequence. 
5 wherein binding to base 4 of the quadruplet by an a-helical zmc finger nucleic acid binding 
motif in the protein is determined as follows: 

_^^_-r A .u^r. o/^cuinn in the a-hcllx is Gin and position 
a) if base 4 m the quaarupici r^. ^w^i. . - - 

- -f 2 is not Asp; 

, 0 0) >f base 4 in the quacmplet .s C, then posu.on -6 in the a-helix mav be any residue, as 
long as position + -2 in the a-helix is not Asp. 

2. A method according to claim 1, wherein binding to base 4 of the quadruplet by an 
a-helical zmc finger nucleic acid binding motif m the protein is additionally determined as 

15 follows: 

c) if base 4 in the quadruplet is G, then position -6 in the a-helix is Arg; or position +6 
is Ser or Thr and position ~ -2 is Asp; 

d) if base 4 in the quadruplet is T. then position -6 m the a-helix is Ser or Thr and 

20 position -r ~2 is Asp. 

3. A method for preparing a nucleic acid binding protem of the Cys2-His2 zmc finger 
class capable of binding to a nucleic acid quadniplet in a target nucleic acid sequence, 
wherem binding to each base of the quadruplet by an a-helical zmc finger nucleic acid 
25 binding motif in the protein is determined as follows: 



30 - - Z is not Asp; 
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7. A merhod according to any one of claims 4 to 6 wherein is T or L 

8. A method according to any one of claims 4 to 7 wherein X2.5 is G-K-A, G-K-C, G- 
K-S, G-K-G, M-R-N or M-R. 

9. A method according to any one of claims 4 to 8 wherein the linker is T-G-E-K or T- 
G-E-K-P. 

10. A mefnod according to any one of claims 4 to 9 wherein position -t-9 is R or K. 

11. A method according to any one of claims 4 to 10 wherein positions H-l, +5 and 4-8 
are not occupied by any one of the hydrophobic amino acids, F, W or Y. 

12. A method according to claim 11 wherein positions -rl, 4-5 and 4-8 are occupied by 
the residues K, T and Q respectively. 

13. A method for preparing a nucleic acid binding protein of the Cys2-His2 zinc finger 
class capable of binding to a target nucleic acid sequence, comprising the steps of: 

a) selectmc a model zinc finger domain from the group consisting of naturally occurring 
zmc fingers and consensus zinc fmgers; and 

b) mutating the finger according to the rules set m any one of claims 1 to 3. 

14. A method according to claim 13, wherein the model zinc finger is a consensus zinc 

finc^e- vvh'^^e srnicnre is selected from the grouo consisting of the consensus strucaire P Y 
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22. A method according to ciaim 21, comprising the steps of: 

a) preparing a nucleic acid construct capable of expressing a fusion protem comprising the 
nucleic acid binding protein and a minor coat protein of a filamentous bacteriophage; 

b) preparing further nucleic acid constructs capable of expressing a ftision protein 
comprising a selectively mutated nucleic acid binding protein and a minor coat protein of 

a filamentous bacteriophage; 

c) causing the fusion proteins deimed in steps (a) and (b) to be expressed on the surface of 
bacteriophage transformed with the nucleic acid constructs; 

d) assavins the ability of the bacteriophage to bind the target nucleic acid sequence and 
selecting the bacteriophage demonstrating superior binding characteristics. 



23 



A method according to any one of claims 20 to 22 wherein the nucleic acid binding 
protein is selectively randomised at any one of positions +1, +5, +8,-1. +2. +3 or -6. 

24. A method for determining the presence of a target nucleic acid molecule, 
comprising the steps of: 

DreDann2 a nucleic acid binding protein by the method of any preceding claim which is 

specific for the target nucleic acid molecule; 

b) exposing a test system comprising the targe: nucleic acid molecule to the nucleic acid 
binding protein under conditions which promote binding, and removing any nucleic acid 
binding protein which remains unbound: 

c) detecting the presence of the nucleic acid binding protein in the test system. 



wherein the nresence of the nucleic acid binding 



:o. A mcmod according to claim 24 or claim 25 wherein the nucleic acid binding 
r^^ore:- M^e :k disolnved on the surface of a filamentous bacteriophage and the presence 
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